33 research outputs found

    Predictive caching and prefetching of query results in search engines

    Get PDF
    We study the caching of query result pages in Web search engines. Popular search engines receive millions of queries per day, and ecient policies for caching query results may enable them to lower their response time and reduce their hardware requirements. We present PDC (probability driven cache), a novel scheme tailored for caching search results, that is based on a probabilistic model of search engine users. We then use a trace of over seven million queries submitted to the search engine AltaVista to evaluate PDC, as well as traditional LRU and SLRU based caching schemes. The trace driven simulations show that PDC outperforms the other policies. We also examine the prefetching of search results, and demonstrate that prefetching can increase cache hit ratios by 50% for large caches, and can double the hit ratios of small caches. When integrating prefetching into PDC, we attain hit ratios of over 0:53.

    Distributed Exploration in Multi-Armed Bandits

    Full text link
    We study exploration in Multi-Armed Bandits in a setting where kk players collaborate in order to identify an ϵ\epsilon-optimal arm. Our motivation comes from recent employment of bandit algorithms in computationally intensive, large-scale applications. Our results demonstrate a non-trivial tradeoff between the number of arm pulls required by each of the players, and the amount of communication between them. In particular, our main result shows that by allowing the kk players to communicate only once, they are able to learn k\sqrt{k} times faster than a single player. That is, distributing learning to kk players gives rise to a factor k\sqrt{k} parallel speed-up. We complement this result with a lower bound showing this is in general the best possible. On the other extreme, we present an algorithm that achieves the ideal factor kk speed-up in learning performance, with communication only logarithmic in 1/ϵ1/\epsilon

    Budget-Constrained Item Cold-Start Handling in Collaborative Filtering Recommenders via Optimal Design

    Full text link
    It is well known that collaborative filtering (CF) based recommender systems provide better modeling of users and items associated with considerable rating history. The lack of historical ratings results in the user and the item cold-start problems. The latter is the main focus of this work. Most of the current literature addresses this problem by integrating content-based recommendation techniques to model the new item. However, in many cases such content is not available, and the question arises is whether this problem can be mitigated using CF techniques only. We formalize this problem as an optimization problem: given a new item, a pool of available users, and a budget constraint, select which users to assign with the task of rating the new item in order to minimize the prediction error of our model. We show that the objective function is monotone-supermodular, and propose efficient optimal design based algorithms that attain an approximation to its optimum. Our findings are verified by an empirical study using the Netflix dataset, where the proposed algorithms outperform several baselines for the problem at hand.Comment: 11 pages, 2 figure

    Torts

    Get PDF

    PicASHOW: Pictorial authority search by hyperlinks on the web

    No full text
    We describe PicASHOW, a fully automated WWW image retrieval system that is based on several link-structure analyzing algorithms. Our basic premise is that a page p displays (or links to) an image when the author of p considers the image to be of value to the viewers of the page. We thus extend some well known link-based WWW page retrieval schemes to the context of image retrieval. PicASHOW’s analysis of the link structure enables it to retrieve relevant images even when those are stored in files with meaningless names. The same analysis also allows it to identify image containers and image hubs. We define these as Web pages that are rich in relevant images, or from which many images are readily accessible. PicASHOW requires no image analysis whatsoever and no creation of taxonomies for preclassification of the Web’s images. It can be implemented by standard WWW search engines with reasonable overhead, in terms of both computations and storage, and with no change to user query formats. It can thus be used to easily add image retrieving capabilities to standard search engines. Our results demonstrate that PicASHOW, while relying almost exclusively on link analysis

    Competitive Caching of Query Results in Search Engines

    Get PDF
    We study the problem of caching query result pages in Web search engines. Popular search engines receive millions of queries per day, and for each query, return a result page to the user who submitted the query. The user may request additional result pages for the same query, submit a new query, or quit searching altogether. An efficient scheme for caching query result pages may enable search engines to lower their response time and reduce their hardware requirements. This work studies query result caching within the framework of the competitive analysis of algorithms. We define a discrete time stochastic model for the manner in which queries are submitted to search engines by multiple user sessions. We then present an adaptation of a known online paging scheme to this model. The expected number of cache misses of the resulting algorithm is no greater than 4 times the expected number of misses that any online caching algorithm will experience under our specific model of query generation
    corecore